智能论文笔记

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Teven Le Scao , Angela Fan , Christopher Akiki , Ellie Pavlick , Suzana Ilić , Daniel Hesslow , Roman Castagné , Alexandra Sasha Luccioni , François Yvon , Matthias Gallé

分类：自然语言处理

2022-11-09

Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.

translated by 谷歌翻译

Text Revealer: Private Text Reconstruction via Model Inversion Attacks against Transformers

Ruisi Zhang , Seira Hidano , Farinaz Koushanfar

分类：自然语言处理

2022-09-21

文本分类已被广泛用于各种自然语言处理应用，例如情感分析。当前的应用程序通常使用大型基于变压器的语言模型来对输入文本进行分类。但是，缺乏关于发布模型时可以倒入多少私人信息的系统研究。在本文中，我们制定了\ emph {text揭示器} - 第一个模型反演攻击，用于针对变形金刚的文本分类的文本重建。我们的攻击忠实地重建了培训数据中包含的私人文本，并访问了目标模型。我们利用外部数据集和GPT-2来生成类似目标域的流利文本，然后使用目标模型的反馈来最佳地扰动其隐藏状态。我们的广泛实验表明，我们的攻击对于具有不同文本长度的数据集有效，并且可以准确地重建私人文本。

translated by 谷歌翻译

Emergency-braking Distance Prediction using Deep Learning

Ruisi Zhang , Ashkan Pourkand

分类：机器人

2021-12-03

预测紧急制动距离对于碰撞避免相关特征是重要的，这是车辆最重要和最受欢迎的安全功能。在这项研究中，我们首先聚集了一个大数据集，包括三维加速度数据和相应的紧急制动距离。使用此数据集，我们提出了一种深度学习模型来预测紧急制动距离，该距离仅需要0.25秒的三维车辆加速数据作为输入。我们考虑两条路面，我们的深度学习方法对道路表面具有强大，并在3英尺内具有准确性。

translated by 谷歌翻译

Improving Differentiable Architecture Search with a Generative Model

Ruisi Zhang , Youwei Liang , Sai Ashish Somayajula , Pengtao Xie

分类：机器学习 | 计算机视觉

2021-11-30

在诸如DARTS等可分解神经结构搜索（NAS）算法中，用于更新模型权重的训练集和用于更新模型架构的验证集是从相同的数据分发采样的。因此，数据集中的罕见功能在训练期间无法获得足够的注意。在本文中，而不是引入更复杂的NAS算法，我们探讨了将质量合成数据集添加到培训中的想法可以帮助分类模型识别其弱点并提高识别准确性。我们介绍了一个名为“可怜的架构搜索的培训策略，使用生成模型（DASGM）”。“在DASGM中，训练集用于更新分类模型权重，而合成的数据集用于训练其架构。生成的图像具有来自培训集的不同分布，可以帮助分类模型了解更好的特征来识别其弱点。我们将达斯哥姆分配到多级优化框架中，并开发一个有效的算法来解决它。CiFar-10，CiFar-100的实验，Cifar-100，并且想象成展示了DASGM的有效性。将提供代码。

translated by 谷歌翻译

TreeGAN: Incorporating Class Hierarchy into Image Generation

Ruisi Zhang , Luntian Mou , Pengtao Xie

分类：计算机视觉 | 人工智能

2020-09-16

条件图像生成（CIG）是计算机视觉和机器学习中的广泛研究问题。给定类，CIG将此类的名称作为输入，生成属于此类的一组图像。在现有的CIG工作中，对于不同的类，它们的相应图像是独立生成的，而不考虑类之间的关系。在现实世界应用中，该类被组织成层次结构，并且它们的分层关系是发布的，用于生成高保真图像。在本文中，我们的目标是利用类层次结构进行有条件的图像生成。我们提出了两种结合类层次结构的方法：先前的控制和后约束。在先前的控制中，我们首先对类层次结构进行编码，然后将其作为在条件生成器中为生成图像而馈送。在Post约束中，在生成图像后，我们测量它们与类层次结构的一致性，并使用一致性分数来指导发电机的训练。基于这两个想法，我们提出了一个由三个模块组成的Treegan模型：（1）将类别的类层次结构（CHE）带到类别的层次结构及其文本名称作为输入，并为每个类学习嵌入;嵌入捕获类之间的分层关系; （2）一种条件图像生成器（CIG），它将Che-Degented嵌入类作为输入，生成属于此类的一组图像; （3）在生成的图像上执行分层分类的一致性检查器，并检查生成的图像是否与类层级兼容;一致性分数用于指导CIG生成层次结构兼容的图像。各个数据集的实验证明了我们方法的有效性。

translated by 谷歌翻译

BigBIO: A Framework for Data-Centric Biomedical Natural Language Processing

Jason Alan Fries , Leon Weber , Natasha Seelam , Gabriel Altay , Debajyoti Datta , Samuele Garda , Myungsun Kang , Ruisi Su , Wojciech Kusa , Samuel Cahyawijaya

分类：自然语言处理

2022-06-30

培训和评估语言模型越来越多地要求构建元数据 - 多样化的策划数据收集，并具有清晰的出处。自然语言提示最近通过将现有的，有监督的数据集转换为多种新颖的预处理任务，突出了元数据策划的好处，从而改善了零击的概括。尽管将这些以数据为中心的方法转化为生物医学语言建模的通用域文本成功，但由于标记的生物医学数据集在流行的数据中心中的代表性大大不足，因此仍然具有挑战性。为了应对这一挑战，我们介绍了BigBio一个由126个以上的生物医学NLP数据集的社区库，目前涵盖12个任务类别和10多种语言。 BigBio通过对数据集及其元数据进行程序化访问来促进可再现的元数据策划，并与当前的平台兼容，以及时工程和端到端的几个/零射击语言模型评估。我们讨论了我们的任务架构协调，数据审核，贡献指南的过程，并概述了两个说明性用例：生物医学提示和大规模，多任务学习的零射门评估。 BigBio是一项持续的社区努力，可在https://github.com/bigscience-workshop/biomedical上获得。

translated by 谷歌翻译

TinyMIM: An Empirical Study of Distilling MIM Pre-trained Models

Sucheng Ren , Fangyun Wei , Zheng Zhang , Han Hu

分类：计算机视觉

2023-01-03

Masked image modeling (MIM) performs strongly in pre-training large vision Transformers (ViTs). However, small models that are critical for real-world applications cannot or only marginally benefit from this pre-training approach. In this paper, we explore distillation techniques to transfer the success of large MIM-based pre-trained models to smaller ones. We systematically study different options in the distillation framework, including distilling targets, losses, input, network regularization, sequential distillation, etc, revealing that: 1) Distilling token relations is more effective than CLS token- and feature-based distillation; 2) An intermediate layer of the teacher network as target perform better than that using the last layer when the depth of the student mismatches that of the teacher; 3) Weak regularization is preferred; etc. With these findings, we achieve significant fine-tuning accuracy improvements over the scratch MIM pre-training on ImageNet-1K classification, using all the ViT-Tiny, ViT-Small, and ViT-base models, with +4.2%/+2.4%/+1.4% gains, respectively. Our TinyMIM model of base size achieves 52.2 mIoU in AE20K semantic segmentation, which is +4.1 higher than the MAE baseline. Our TinyMIM model of tiny size achieves 79.6% top-1 accuracy on ImageNet-1K image classification, which sets a new record for small vision models of the same size and computation budget. This strong performance suggests an alternative way for developing small vision Transformer models, that is, by exploring better training methods rather than introducing inductive biases into architectures as in most previous works. Code is available at https://github.com/OliverRensu/TinyMIM.

translated by 谷歌翻译

Cross Modal Transformer via Coordinates Encoding for 3D Object Dectection

Junjie Yan , Yingfei Liu , Jianjian Sun , Fan Jia , Shuailin Li , Tiancai Wang , Xiangyu Zhang

分类：计算机视觉

2023-01-03

In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.

translated by 谷歌翻译

Backdoor Attacks Against Dataset Distillation

Yugeng Liu , Zheng Li , Michael Backes , Yun Shen , Yang Zhang

分类：机器学习

2023-01-03

Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.

translated by 谷歌翻译

PMT-IQA: Progressive Multi-task Learning for Blind Image Quality Assessment

Qingyi Pan , Ning Guo , Letu Qingge , Jingyi Zhang , Pei Yang

分类：计算机视觉

2023-01-03

Blind image quality assessment (BIQA) remains challenging due to the diversity of distortion and image content variation, which complicate the distortion patterns crossing different scales and aggravate the difficulty of the regression problem for BIQA. However, existing BIQA methods often fail to consider multi-scale distortion patterns and image content, and little research has been done on learning strategies to make the regression model produce better performance. In this paper, we propose a simple yet effective Progressive Multi-Task Image Quality Assessment (PMT-IQA) model, which contains a multi-scale feature extraction module (MS) and a progressive multi-task learning module (PMT), to help the model learn complex distortion patterns and better optimize the regression issue to align with the law of human learning process from easy to hard. To verify the effectiveness of the proposed PMT-IQA model, we conduct experiments on four widely used public datasets, and the experimental results indicate that the performance of PMT-IQA is superior to the comparison approaches, and both MS and PMT modules improve the model's performance.

translated by 谷歌翻译